[TLDR] Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
A multi-agent approach for scalable knowledge integration surpassing LLM context window limits
Authors: Z. Liu et al. Published on Arxiv: 2025-05-27 Link: http://arxiv.org/abs/2505.21471v1 Institutions: Tsinghua University, Department of Computer Science & Technology, Institute for AI • Institute for AI Industry Research (AIR), Tsinghua University • Tongyi Lab, Alibaba Group Keywords: Large Language Models, multi-agent systems, context window extension, knowledge integration, retrieval-augmented generation, distributed reasoning, question answering, survey generation, scalability, global synchronization
Large Language Models (LLMs) have made notable progress in expanding context windows, yet real-world tasks often require integrating far more external knowledge than these models can accommodate. Existing methods to overcome context window limitations—such as retrieval-augmented generation and current multi-agent systems—result in information loss or fail to scale efficiently as input size grows.
To address these challenges, the authors introduce a novel method, detailed below:
- EXTAGENTS is a multi-agent collaboration framework that partitions vast external knowledge into manageable context chunks, assigning each portion to a ‘Seeking Agent’, whose summarized outputs are then integrated by a central ‘Reasoning Agent’.
- The framework features global knowledge synchronization across agents on a shared scratchpad, and implements knowledge-accumulating reasoning that incrementally aggregates relevant information to avoid overload and maximize relevance.
- Benchmarking is performed across multi-hop QA, long survey generation, and direct context tasks in both English and Chinese, compared with state-of-the-art retrieval and multi-agent baselines.
The results section further substantiates the framework’s effectiveness:
- EXTAGENTS consistently outperforms previous context window extension methods, scaling with input size and exceeding LLM window limits by handling up to 1024k tokens.
- It demonstrates superior F1 and Helmet correctness on ∞Bench+ and HotpotQA, and achieves higher quality and diversity in survey generation with better efficiency and lower latency.
- Ablation studies confirm the necessity of both global synchronization and the incremental reasoning process.
Building upon these findings, final conclusions are drawn:
- EXTAGENTS removes scalability bottlenecks at inference-time, allowing LLM task performance to benefit from increasingly large external knowledge without retraining.
- The framework is model- and task-agnostic, supporting parallelism and generalization.
- Future work will target dynamic agent orchestration, support for multi-modal/tool reasoning, robust chunking, and theoretical analysis of information robustness and safety.